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DETAILED ACTION 

1 . This Office Action is in response to the Arguments filed on 1 0/16/2007. Claims 1- 
11, 13, 15-17, and 22-24 remain pending with claims 12,14, and 18-21 being cancelled. 
All mentioned claims have been examined. The Applicants' amendment and remarks 
have been carefully considered, but they were not persuasive and do not place the 
claims in condition for allowance. Accordingly, this Action has been made FINAL. 

2. All previous objections and rejections directed to the Applicant's disclosure and 
claims not discussed in this Office Action have been withdrawn by the Examiner. 

Response to Arguments 

3. Applicant's arguments (pages 2-5) filed on 1 0/1 6/2007 with regard to claims 1 - 
1 1, 13, 15-17, and 22-24 have been fully considered but they were not persuasive. 

As to claims 1,15, and 16, the Applicants argue that the references Crepy et al. 
in view of Valles are not combinable since they disclose materially different system and 
one with ordinary skilled in the art would not have combined the two references. More 
specifically, the Applicants argue that the Crepy reference deals more towards text to 
speech system, while the Valles reference deals more towards conversational system 
where speech is input through a microphone. The Examiner traverses these arguments. 
Valles reference is cited to teach categorizing words, phrases or utterances by grammar 
type in a grammar sub-tree (see [0143] and Figures 2, 3, and 4). Further, it should be 
noted that the categorizing is not done on the speech, but rather converted into textual 
form for the formation of a tree. This can be seen in the following paragraph of Valles: 



Application/Control Number: 10/647,709 Page 3 

Art Unit: 2626 

[0034] and [0086]. The use of hierarchical structures enables the semantics of the 
natural language to be represented and used to categorization (See Valles, [0143]). 
Furthermore, the combination of the two mentioned references allows the text that is 
input in the Crepy et al. reference to be categorized upon being input into the TTS. The 
categorization allows the semantic information from the text to be known. 

Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically teach or described as set forth 
in section 102 of this title, if the differences between the subject matter sought to be patented and the 
prior art are such that the subject matter as a whole would have been obvious at the time the invention 
was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

5. Claims 1,15, and 16 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Crepy et al. (US 6,622,121 ) in view of Valles et al. (US 2004/0083092). 

As to claim 1 , Crepy et al. teaches a method for testing and improving the 
performance of a speech recognition engine, comprising: 

loading into a memory location one or more words, phrases or utterances 
(see col. 2, lines 64-66 and col. 3, lines 36-39); 

identifying one or more of the words, phrases or utterances for recognition 
by a speech recognition engine (see col. 3, lines 36-39 and col. 3, lines 40-43) 
(e.g. It is seen that the reference text, which consists of words are identified and 
will be passed to the speech recognition); 
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extracting the one or more words, phrases or utterances in a selected 
grammar sub-tree via a vocabulary extractor module and, passing the extracted 
one or more identified words, phrases or utterances to a text-to-speech 
conversion module that provides an audio formatted pronunciation of each word, 
phrase, or utterance (see col.3, lines 36-46 and col. 1, line 65-67) (e.g. The 
extracted words come from the reference text, which is then fed into the text to 
speech engine. An audio representation is produced as a result of the conversion 
of text into speech.); 

passing the audio pronunciation of each of the identified one or more 
words, phrases or utterances, from the text-to- speech conversion module to the 
speech recognition engine (see col. 4, lines 59-65 and Figure 4, elements, 404 
and 406).; 

creating a recognized word, phrase or utterance for each audio 
pronunciation passed to the speech recognition engine (see col. 4, lines 59-65) 
(e.g. It is seen that the words are recognized from the audio file and then 
compared.); and 

analyzing each recognized word, phrase or utterance created b¥ the 
speech recognition engine to determine how closely each created recognized 
word, phrase or utterance approximates the respective audio pronunciation from 
which each created recognized word, phrase or utterance is derived (see col. 4, 
lines 65-col. 5, lines 1 1 ) (e.g. It is seen that a comparison is done with regards to 
the recognized words and the actual words using the WER calculation.) 
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However, Crepy ef al. does not specifically teach the categorizing by a 
grammar type where same utterances are grouped together in a grammar sub- 
tree. 

Valles does teach the categorizing the one or more words, phrases or 
utterances by grammar type whereby all words, phrases or utterances of a same 
grammar type are grouped together in a grammar sub-tree (see [0143] and 
Figures 2, 3, and 4) (e.g. From the cited sections it is seen that a grammar is 
employed for categorizing the word that are inputted. Further a table of 
synonyms as well as other attributes are mentioned.) 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy et al. with the inclusion of categorizing words according to a 
specific grammar as taught by Valles. The motivation to have combined the 
references involves the ability to express semantics of a natural language and 
the categorization of concepts in a phrase (see [0142]-[0143]). Hence the 
extraction of words based on the grammar would have been obvious based upon 
the category chosen as taught by Valles. 



As to claim 15, Crepy et al. in view of Valles teach all the claimed limitations as 
applied to claim 1 above . 

Furthermore, Valles teaches a plurality of grammar sub-trees are grouped 
together to form a grammar tree containing all of the one or more words, 
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phrases, or utterances (se Figures 2-4, and [0143]-[0146]) (e.g. It is seen that 
various grammar sub-trees exist depending on the category that the word is 
determined to be part of. Further, multiple related words may exist in the 
category, which may be retrieved from a stored location.) 

As to claim 16, Crepy etal. in view of Valles teach all the claimed limitations as 
applied to claim 1 above 

Furthermore, Valles teaches the identifying of an utterance for recognition 
by the speech recognition engine (see Crepy et al., Figure 4, element 406) 
includes selecting the grammar sub-tree containing the one or more words, 
phrases, or utterances (see [0143], Figures 2 and 3). 

6. Claims 2-6, 10, 11, and 22 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Crepy et al. (US 6,622,121) in view of Valles et al. (US 
2004/0083092)as applied to claim above, 1 and further in view of Raud et al. (US 
6,125,341). 

As to claim 2, Crepy et al. in view of Valles teach the evaluating the recognized 
words based on a word error rate. 

However, Crepy et al. in view of Valles do not specifically teach the 
assignment of confidence score for each utterance, phrase, or word 

Raud ef al. teaches assigning a confidence score to each utterance, 
phrase or word (see col. 6, lines 8-21). 
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It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy ef a/, in view of Valles with the inclusion of assigning confidence 
score as taught by Raud et al. The motivation to have combined the references 
involves the ability determine if the current vocabulary is appropriate for 
recognizing words and to determine of a word is properly recognized (see Raud 
ef al. col. 6, lines 8-13). 



As to claim 3, Crepy et al. in view of Valles in view of Raud ef al. teaches all the 
claimed limitations as applied to claims 1 and 2 above. 

Furthermore, Raud ef al. teaches the assigning of confidence score to 
each recognized utterance based on a confidence level associated with the 
utterance based on prior speech recognition engine training (see Raud et al. col. 
6, line 8)(e.g. It is obvious that the confidence score is compared based on a 
threshold for recognition accuracy (see col. 6, lines 23-31). 

As to claims 4 and 10, Crepy ef al. in view of Valles in view of Raud ef al. 
teaches all the claimed limitations as applied to claims 1 and 3 above. 

Furthermore, Raud ef al. teaches the determination being made of 
whether the recognized utterance is the same as the utterance derived by the 
speech recognition engine based on prior speech recognition training confidence 
level (see Raud et a/., col. 4, lines 33-35)) (e.g. It should be noted that there is a 
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vocabulary used for checking if there is a match. An initial vocabulary is used, 
then other vocabularies are used for subsequent words not found or recognized 
using the initial vocabulary (see col. 5, lines 46-56). It is inherent that the words 
from the vocabulary and the words from the utterance are matched for similarity). 



As to claims 5 and 1 1 , Crepy et al. in view of Valles in view of Raud et al. teach 
all the claimed limitations as applied to claims 1 and 2 above. 

Furthermore, Raud et al. teaches if the confidence score exceeds an 
acceptable level designating the recognized utterance as accurately recognized 
by the speech recognition engine (see Raud et al: col. 5, lines 1 8-30). 

As to claim 6, Crepy et al. in view of Valles in view of Raud et al. teaches all the 
claimed limitations as applied to claims 1 , 2, and 5 above. 

Furthermore, Raud et al. teaches if the confidence score less than a 
certain value, a modification is made to the speech recognition engine to 
recognize the word (see col. 6, lines 8-31 ) (e.g. If the confidence level is less 
than a value, the system requests verification from a user or asks a question to 
remove any ambiguity. This is seen as a modification to the speech recognition 
engine to interpret the utterance. Further, other vocabularies are used to 
determine whether a increase in performance can be obtained.). 
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7. Claim 7 is rejected under 35 U.S. C. 103(a) as being unpatentable over Crepy et 
al. in view of Valles and Raud et al. as applied to claim 5 above, and further in view of 
Bickley et al. (US 7,013,276). 

As to claims 7, Crepy et al., Valles and Raud et al. teach improving the 
performance of a speech recognition engine. 

However, Crepy et al., Valles and Raud et al. do not specifically teach the 
notification to a developer when the score is lower than a threshold value. 

Bickley et al. teaches a alert mechanism for words that are similar and are 
subject to confusion (see col. 10, lines 63-65) from threshold calculation (see col. 
10, lines 38-40). 

It would have been obvious to one of ordinary skilled in the art to modify 
the speech recognition performance methods as taught by Crepy et al., Valles 
and Raud et al. with the use of a notification sent to a software developer when 
value is below threshold as taught by Bickley et al. The motivation to combine 
these references involves the distinguishing between similar words, which may 
not be recognized by speech recognition engines (see Bickley et al. col. 2, line 
27-36). 

8. Claim 17 is rejected under 35 U.S.C. 103(a) as being unpatentable over Crepy et 
al. in view of Valles as applied to claim 1 above, and further in view of Kennewick et al. 
(2004/0044516). 
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As to claim 17, Crepy et al. in view of Valles teach all the claimed limitations as 
applied to claims 1. 

Furthermore, Crepy et al. teaches the creating of a recognized word, 
phrase, or utterance for each respective audio pronunciation includes converting 
each respective audio pronunciation from an audio format to a digital format by 
the speech recognition engine (see Crepy et al., col. 4, lines 56-64). (e.g. It is 
seen that the audio form of the file is converted into the digital form. The words 
contain an implied pronunciation of the words.). 

However, Crepy et al. in view of Valles do not specifically teach the 
analyzing phonetically each respective audio pronunciation of each of the one or 
more recognized word, phrase or utterance. 

Kennewick et al. does teach 

the analyzing phonetically each respective audio pronunciation of each of 
the one or more recognized word, phrase or utterance (see [0151]). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy et al. and Valles. with the inclusion of analyzing the phonetics of 
each audio pronunciation. The motivation to have combined the references 
involves the add pronunciations not present in the dictionary in order to increase 
speech recognition accuracy and learning (see Kennewick et al., [0151]). 
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9. Claim 22 is rejected under 35 U.S.C. 103(a) as being unpatentable over Crepy et 
al. (US 6,622,121 ) in view of Valles et al. (US 2004/0083092) and in view of Raud et 
al.. 

As to claim 22, Crepy et al., Valles and Raud et al. teach the improvement of 
speech recognition engine, comprising: 

As to claim 22, Crepy et al. teaches a method for testing and improving the 
performance of a speech recognition engine, comprising: 

identifying one or more of the words, phrases or utterances for recognition 
by a speech recognition engine (see col. 3, lines 36-39 and col.3,lines 40-43) 
(e.g. It is seen that the reference text, which consists of words are identified and 
will be passed to the speech recognition); 

creating and passing the audio pronunciation of each of the identified one 
or more words, phrases or utterances, from the text-to- speech conversion 
module to the speech recognition engine that provides an audio formatted 
pronunciation of each of the identified words, phrases, or utterances to the 
speech recognition engine (see col. 4, lines 59-65 and Figure 4, elements, 404 
and 406) (e.g. It is seen from the cited section that an audio version is created of 
the input speech and passed to the speech recognition engine.); 

deriving a recognized word, phrase or utterance for each audio 
pronunciation passed to the speech recognition engine; (see col. 4, lines 65-col. 
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5, lines 1 1 ) (e.g. It is seen that a comparison is done with regards to the 
recognized words and the actual words using the WER calculation.) 

However, Crepy et al. does not specifically teach the categorizing by a 
grammar type where same utterances are grouped together in a grammar sub- 
tree. 

Valles does teach the categorizing the one or more words, phrases or 
utterances by grammar type whereby all words, phrases or utterances of a same 
grammar type are grouped together in a grammar sub-tree (see [0143] and 
Figures 2, 3, and 4) (e.g. From the cited sections it is seen that a grammar is 
employed for categorizing the word that are inputted. Further a table of 
synonyms as well as other attributes are mentioned.) 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy et al. with the inclusion of categorizing words according to a 
specific grammar as taught by Valles. The motivation to have combined the 
references involves the ability to express semantics of a natural language and 
the categorization of concepts in a phrase (see [0142]-[0143]). Hence the 
extraction of words based on the grammar would have been obvious based upon 
the category chosen as taught by Valles. 

However, Crepy et al. in view of Valles do not specifically teach the 
assignment of confidence score for each utterance, phrase, or word. 
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Raud et al. teaches the assigning a confidence score to each utterance, 
phrase or word (see col. 6, lines 8-21) based on prior training of the speech 
recognition engine to recognize similar or same words, phrases or utterances as 
t-he-each derived recognized word, phrase or utterance(see Raud et al:, col. 4, 
lines 33-35) (e.g. It should be noted that there is a vocabulary used for checking 
if there is a match. An initial vocabulary is used, then other vocabularies are used 
for subsequent words not found or recognized using the initial vocabulary (see 
Raud et al., col. 5, lines 46-56). It is inherent that the words from the vocabulary 
and the words from the utterance are matched for similarity).; and. 

if the confidence score is less than an acceptable threshold, modifying the 
speech recognition engine to recognize with higher accuracy the word, phrase or 
utterance from which the derived recognized word, phrase or utterance is derived 
higher accuracy (see col. 5, lines 31-38 and col. 6, lines 22-51). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy et al. and Valles with the inclusion of assigning confidence score 
as taught by Raud et al.. The motivation to have combined the references 
involves the ability determine if the current vocabulary is appropriate for 
recognizing words and to determine of a word is properly recognized (see col. 6, 
lines 8-13). 
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10. Claims 8 and 23 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Crepy et al., Valles and Raud et al. as applied to claims 6 and 22 above, and further in 
view of Kennewick et al. (US 2004/0044516). 

As to claims 8 and 23, Crepy et al., Valles and Raud et al. teach all the claimed 
limitations as applied to claims 1, 5, and 6 above and claim 22. Furthermore, Raud et al. 
teaches the assigning of a confidence score and if less than a threshold, obtaining an 
acceptable confidence score upon next pass through the engine (see col. 7, lines 20- 
25) 

However, Crepy et al., Valles and Raud et al. do not specifically teach the 
altering of the audio pronunciation with the confidence score less than an 
acceptable threshold. 

Kennewick et al. does teach the altering of audio pronunciation of the 
word, phrase, or utterance associated with the confidence score that is less than 
an acceptable confidence score threshold level such that the altered audio 
pronunciation obtains an acceptable confidence score upon next pass through 
the speech recognition engine (see [0151]). (e.g. The speech recognition engine 
is adaptive based on the confidence levels and the pronunciation of the word 
recognized.) 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy etal., Valles and Raud et al. with the inclusion of altering the 
audio pronunciation of the recognized word as taught by Kennewick et al. The 
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motivation to have combined the references involves the ability to improve the 
accuracy of the speech recognition engine as well as the ability for the speech 
recognition engine to learn with time (see Kennewick et al., [0151]). 

1 1 . Claims 9 and 24 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Crepy et al. in view of Valles and in view of Raud et al. as applied to claims 6 and 22 
above, and further in view of Roberts et al. (US 6,999,930). 

As to claim 9, Crepy et al. in view of Valles in view of Raud et al. teach all the 
claimed limitations as applied to claims 1 , 5, and 6 above and claim 22. Furthermore, 
Raud et al. teaches the use of a confidence score (see col. 6, lines 23-31 ).. 

Crepy et al. in view of Valles in view of Raud et al do not specifically 
teach the reduction of the confidence threshold level. 

However, Roberts does teach the reduction of the confidence score 
threshold level (see col. 10, lines 50-60). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy et al. in view of Valles in view of Raud et al. with the inclusion of 
altering the reducing the acceptable confidence sore threshold level as taught by 
Roberts et al. The motivation to have combined the references involves the 
ability to generate more potential matches even when the confidence level is low 
(see Roberts etal., col. 10, lines 57-60). 
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Conclusion , 

12. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .1 36(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

13. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Paras Shah whose telephone number is (571)270-1650. 
The examiner can normally be reached on MON.-THURS. 7:30a.m.-4:00p.m. EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on (571)272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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